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Abstract 

Expansion and contraction of microRNA (miRNA) fannilies can be studied in sequenced plant genonnes through sequence alignnnents. 
Here, we focused on nniR1 69 in sorghunn because of its innplications in drought tolerance and stenn-sugar content. We were able to 
discover many nniR169 copies that have escaped standard genome annotation methods. A new miR169 cluster was found on 
sorghum chromosome 1 . This cluster is composed of the previously annotated sb\-MIR169o together with two newly found MIR169 
copies, named sb'\-MIR169t and sb\-MIR169u. We also found that a miR169 cluster on sorghum chr7 consisting of sb\-MIR169\, 
sb\-MIR169m, and sb\-MIR169n is contained within a chromosomal inversion of at least 500 kb that occurred in sorghum relative to 
Brachypodium, rice, foxtail millet, and maize. Surprisingly, syntenyof chromosomal segments containing A////?/ 69 copies with linked 
bHLH and CONSTANS-LIKE genes extended from Brachypodium to dictotyledonous species such as grapevine, soybean, and cassava, 
indicating a strong conservation of linkages of certain flowering and/or plant height genes and microRNAs, which may explain linkage 
drag of drought and flowering traits and would have consequences for breeding new varieties. Furthermore, alignment of rice and 
sorghum orthologous regions revealed the presence of two additional miRI 69 gene copies (miR1 69r and miR1 69s) on sorghum chr7 
that formed an antisense miRNA gene pair. Both copies are expressed and target different set of genes. Synteny-based analysis of 
microRNAs among different plant species should lead to the discovery of new microRNAs in general and contribute to our under- 
standing of their evolution. 
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Introduction 

Several mechanisms have been proposed to explain the 
evolutionary origin of microRNA (miRNA) genes. For instance, 
they can be derived from miniature-inverted repeat transpos- 
able elements (MITEs) because the inverted repeat with a short 
internal sequence can be transcribed and form a hairpin struc- 
ture that can be processed into small RNAs. Indeed, several 
miRNA genes derived from MITEs have been described in 
Arabidopsis and rice (Piriyapongsa and Jordan 2008). It has 
also been proposed that miRNA genes can originate from 
spontaneous mutations in hairpin-like structures in the 
genome, and several miRNAs in Arabidopsis appeared to 
have originated this way (Fenselau de Felippes et al. 2008). 
The third and probably the most accepted explanation for the 
origin of microRNAs is based on the inverted duplication of 
genes, which when transcribed would form hairpin structures 
capable of generating small RNAs with perfect complemen- 
tarity to the parental transcripts (Allen et al. 2004; Axtell and 



Bowman 2008). Over time, the accumulation of mutations 
erodes the extensive homology with the parental transcripts 
and the accuracy of small RNA processing improves, eventu- 
ally leaving a single segment (the mature miRNA) that retains 
complementarity (Allen et al. 2004; Axtell and Bowman 
2008). This hypothesis is supported with evidence where 
extended complementarity between plant miRNAs and 
target mRNAs is more evident in less-conserved and younger 
loci (Fahlgren et al. 2007). 

Duplication of a newly formed miRNA eventually results in 
the creation of a multigene miRNA family, with evolutionary 
old and conserved miRNAs having more than one gene copy 
in the genome, whereas new and thus nonconserved 
(or species-specific) miRNAs being usually single copy (Allen 
et al. 2004; Fahlgren et al. 2007; Ma et al. 2010). Similar to 
protein-coding genes, duplication and subsequent divergence 
of miRNA gene copies can lead to loss of function (pseudo- 
genes), keep current function (gene redundancy), gain a new 
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function (neofunctionalization), or acquire a more specialized 
function (subfunctionalization) (Maher et al. 2006). Consistent 
with this, diversification in the sequence of duplicated miRNA 
gene copies was accompanied by changes in spatial and tem- 
poral expression patterns (Jiang et al. 2006; Maher et al. 
2006). MicroRNA genes that undergo events of tandem 
duplication result in the formation of paralogous miRNA 
gene copies located in close proximity to each other on the 
same chromosome and thus forming miRNA clusters. 
Recently, Sun et al. (2012) analyzed miRNAs that had ampli- 
fied through tandem duplication in Arabidopsis, poplar 
{Populus thricocarpa), rice {Oryza sativa), and sorghum 
{Sorghum bicolor) genomes and found that 248 miRNAs in 
total belonging to 51 miRNA families arose by tandem dupli- 
cation. This study showed the importance of tandem duplica- 
tion events as a major force in the creation of new miRNA 
gene copies and into the expansion of miRNA families. 
Interestingly, the average miRNA copy number in tandemly 
duplicated regions from eudicots A. thaliana and P. thrico- 
carpa was lower (2.8 copies/tandem) than in monocots 
O. sativa and 5. bicolor (3.4 copies/tandem), suggesting that 
tandem duplications might have been more common in rice 
and sorghum (Sun et al. 201 2). Despite this finding, there is a 
lack of knowledge on the evolutionary fate of miRNA gene 
clusters across the grass family. 

Here, we analyzed the process of tandem duplication that 
gave rise to IVIIRIGD gene clusters in sorghum (5. bicolor [L] 
Moench) and traced its evolutionary path by aligning contig- 
uous chromosomal segments of diploid Brachypodium, rice, 
foxtail millet, and the two homoeologous regions of allotetra- 
ploid maize. We have chosen miR169 as an example because 
of its possible role in stem-sugar accumulation in sorghum 
besides its previously described role in drought stress response 
in several plant species. We discovered allelic variation in 
miR169 expression between grain and sweet sorghum, sug- 
gesting that miRI 69 could also play a role in the sugar content 
of sorghum stems (Calvino et al. 201 1). Although high sugar 
content in stems is a trait shared by sorghum and sugarcane 
(Calvino et al. 2008, 2009), this trait seems to be silent in other 
grasses (Calvino and Messing 201 1). This prompted us to in- 
vestigate the evolution and dynamic amplification of miRI 69 
gene copies in grass genomes. We found that synteny of 
chromosomal segments containing MIR169 gene copies was 
conserved between monocotyledonous species such as 
Brachypodium and sorghum but surprisingly also across the 
monocot barrier in dicotyledonous species such as grapevine, 
soybean, and cassava. Furthermore, linkage of MIR169 copies 
with a bHLH gene similar to /\rab/c/ops/s bHLH137 and with a 
CONSTANS-LIKE gene similar to Arabidopsis C0L14 was con- 
served in all the grasses examined as well as in soybean and 
cassava (linkage between MIR169 and bHLH genes) and 
grapevine (linkage between MIR169 and C0L14 genes). We 
discuss the importance of this finding for breeding crops with 
enhanced bioenergy traits. 



Materials and Methods 

DMA Sequences 

Rice sequences were downloaded from the Rice Annotation 
Project Database website (http://rapdb.dna.affrc.go.jp/), 
whereas Brachypodium, foxtail millet, sorghum, maize, grape- 
vine, soybean, and cassava sequences were downloaded from 
the Join Genome Institute website (www.phytozome.net). 
MicroRNA sequences were downloaded from the miRBase 
database (http://www.mirbase.org/). 

MIR169 Gene Prediction and Annotation 

Stem-loop precursors/hairpin structures from previously anno- 
tated MIRI 69 genes were used in reciprocal Blastn analysis 
during the process of creating synteny graphs. Previously 
known MIR169 stem-loop precursors were used as query 
sequences with Blastn. When the corresponding target 
sequences identified matched a genomic region where 
there was no any previous annotation of a MIRI 69 gene 
copy, we took a 100-300 bp segment and fed it into an 
RNA folding program (RNAfold web server: http://rna.tbi. 
univie.ac.at/cgi-bin/RNAfold.cgi) to look for signatures of 
hairpin-like structures typical of microRNAs. Guidelines in 
microRNA gene prediction were followed as suggested by 
Meyers etal. (2008). 

Experimental Validation of Predicted MIR169 Genes 

We took advantage of our previously sequenced small RNA 
libraries from sorghum stems (Calvino et al. 2011) and 
mapped small RNAs to the newly predicted MIR1 69ds/\JuN 
hairpin sequences. To validate the newly predicted MIRI 69s in 
maize, we used the SOLID platform to sequence small RNAs 
derived from endosperm tissue from B73 and Mo17 inbred 
lines as well as endosperm tissue derived from their reciprocal 
crosses. Small RNA reads were then mapped to zma-/\////?/69s 
stem-loop precursor. 

Prediction of miRI 69 Targets 

Target prediction was conducted in sorghum for the newly 
discovered miRI 69r* and miRI 69s microRNAs using the Small 
RNA Target Analysis Server psRNATarget (Dai and Zhao 201 1 ) 
at http://plantgrn.noble.org/psRNATarget/. In addition to the 
sorghum genome sequence incorporated into psRNATarget 
(Sorghum DCFI Gene Index SBGI Release 9) as preloaded tran- 
scripts, we also uploaded a PASTA file from phytozome 
(http://www.phytozome.net/dataUsagePolicy.php?org=Org_ 
Sbicolor) with all sorghum genes coding sequences and used 
this data set for target prediction as well. Target prediction 
was conducted for the annotated 21 nt miRI 69 and for the 
most abundant small RNA reads different from 21 nt in size 
that matched the predicted miRI 69 sequence (miRI 69 
variants). 
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Estimation of MIR169 Gene Number in Ancestral Species 

To estimate the numbers of MIR169 genes in ancestral species 
of the grass family together with gains and losses of MIR169 
copies during grass evolution, we took the parsimony 
approach as described previously by Nozawa et al. (2012). 

Estimation of Substitution Rates in MIR169 Genes and 
Ancient Duplication Time 

To study the rate of nucleotide substitution in MIR169 genes, 
we aligned MIR169 stem-loop sequences using MUSCLE, 
available with the MEGA5 software package (Tamura et al. 
201 1). When we analyzed the gained MIR169 gene copy that 
gave rise to s\t-MIR169h, sb\-MIR169y, and zm3-MIR169s 
copies (fig. 6A: region miR169 cluster on sorghum chr2), we 
first computed the average (Jukes and Cantor) distance (D^) 
between zm3-MIRl69s/sb\-MIR169y and zm3-MIR169s/ 
s\t-MIR169h gene pairs. The substitution rate (/?) was subse- 
quently calculated with the formula R = Dq ITT, where 7 is the 
divergence time (in this case 26 million years ago [Ma]), when 
the ancestor of maize and sorghum diverged from foxtail 
millet. We then calculated the ancient duplication time at 
which s\t-MIR169h arose by using the formula t=dJ2R, 
where t is the divergence time of two sequences and is 
the average distance between sequences in the miR169 clus- 
ter (the average of pairwise distances between s\t-MIR169h/ 
s\t-MIR169g and s[{-MIRl69h/s\t-MIR169i, respectively). A 
similar rationale was applied for the calculation of the ancient 
duplication time of sb\-MIR169t in the sorghum miR169 clus- 
ter 1 (fig. 6A). 

Rate of Synonymous and Nonsynonymous Substitutions 
of the bHLH Orthologous Gene Pairs 

We used gene exon sequences to estimate synonymous and 
nonsynonymous substitutions using the MEGA5 program 
(Tamura et al. 2011). The synonymous and nonsynonymous 
substitution rate was calculated for a given bHLH orthologous 
gene pair {Brachypodium-nce; Brachypodium-foxtaW millet; 
Srac/7ypoc//L//T?-sorghum; and Brachypodium-malze), where 
Brachypodium bHLH gene Bradi3g41510 was compared 
with the HLH gene Bradi4g34870. 

Phylogenetic Analysis 

Phylogenetic analysis were performed by creating multiple 
alignments of nucleotide or amino acid sequences using 
MUSCLE and Clustal_W, respectively, and phylograms were 
drawn with the MEGA5 program using the neighbor joining 
(NJ) method (Tamura et al. 2011). Multiple alignments of 
microRNA 169 stem-loop sequences were improved by re- 
moving the unreliable regions from the alignment using the 
web-based program GUIDANCE (http://guidance.tau.ac.il), 
and NJ phylogenetic tress were created with 2,000 bootstrap 
replications, and the model/method used was the maximum 
composite likelihood. 



Results 

New MIR169 Gene Copies in the Rice, Sorghum, and 
Maize Genomes 

A miRNA cluster as defined in the miRBase database (release 
19, August 2012) is composed of two or more miRNA gene 
copies that are located on the same chromosome and sepa- 
rated from each other by a distance of lOkb or less. The 
distance set to define a miRNA cluster is arbitrary though, as 
evidenced by a cluster composed of 16 copies of MIR2118 
distributed over an 18-Kb segment on rice chr4 (Sun et al. 
2012). The sequencing of the sorghum genome allowed the 
identification of 1 7 MIR169 gene copies, from which five were 
arranged in two clusters, one located on chr2 {sb\-MIR169f 
and sb\-MIR169g) and the other located on chr7 {sb\-MIR169\, 
sb\-MIR169m, and sb\-MIR169n, respectively (Paterson et al. 
2009) (fig. 1 and table 1). 

We first analyzed the region containing the MIR169 cluster 
on sorghum chr7 because it had the highest number of gene 
copies. The alignment of sorghum genes flanking MIR169 
copies to the rice genome permitted the identification of a 
collinear region on rice chr8 also containing a cluster of 
MIR169 gene copies (fig. 2). Interestingly, the cluster on rice 
chr8 was composed of five MIR169 gene copies, whereas the 
orthologous cluster on sorghum chr7 contained only three 
annotated MIR169 gene copies. Further investigation based 
on reciprocal Blastn analysis revealed that osa-MIR169\ and 
osa-MIR169q are orthologous to a region on sorghum chr7, 
where there was no previous annotation of MIR169 genes. 
Indeed, by taking the sorghum DNA segment highly similar to 
0S3-MIR169\ and os3-MIR169q and subjecting it to an RNA 
folding program (RNAfold: http://rna.tbi.univie.ac.at/cgi-bin/ 
RNAfold.cgi) to identify hairpin-like structures characteristic 
of microRNA precursors, we were able to discover two 
new MIR169 gene copies in sorghum that we named 
sb\-MIR169r and sb\-MIR169s, respectively (fig. 2 and supple- 
mentary fig. SI, Supplementary Material online). Independent 
support for the new annotation of sb\-MIR169r and 
sb\-MIR169s was achieved through orthologous alignment 
of a third species, maize, through zn)a-MIR169e and zma- 
MIR169b gene copies (supplementary fig. S2, Supplementary 
Material online). 

To identify additional MIR169 gene copies in sorghum that 
might have arisen by tandem duplication, we took each of the 
annotated MIR169 genes and performed Blastn analysis 
against the sorghum genome to search for new copies located 
in close proximity to any of the previously annotated ones. 
Such analysis identified two new MIR169 copies on sorghum 
chromosome 1 (chrl) when sb\-MIR169o was used as query 
that we named sb\-MIR169t and sb\-MIR169u, respectively 
(supplementary fig. SI, Supplementary Material online). 
Thus, sb\-MIR169o together with sb\-MIR169t and sbi- 
MIR169U constituted a new MIR169 cluster of the sorghum 
genome (table 1). The segment containing the newly 
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Fig. 1. — Distribution of MIR169 gene copies in the genome of Sorghum bicolor cultivar BTx623. A total of 22 MIR169 gene copies are shown, with 1 7 
copies previously annotated by the sorghum genome-sequencing consortium (shown in black and red) (Paterson et al. 2009) and with five additional MIR169 
copies described in this study for the first time (shown in green). The evolutionary trajectory of sorghum MIR1 69 gene copies arranged in clusters 1, 2, and 3 
are described. 



Table 1 

Summary of MIR169 Gene Copies Described in This Study 



Chromosome 



Gene ID^ 



Coordinates'' 



Strand Distance between Genes Flanking the Cluster*^ 



Brachypodium distachyon 
chrl 
chr3 

Oryza sativa 
chr3 
chr8 



chr9 

Setaria italica 
chr9 
chr2 



chr6 



b6\-MIR169k 
bdi\-MIR169e 
bd\-MIR169g 

osa-MIR169r 
osa-MIR169\ 
osa-MIR169h 
osa-MIR169m 
osa-MIR169\ 
osa-MIR169q 
osa-MIR169] 
osa-MIR169k 

s\t-MIR169o 
s\t-MIR169f 
s\t-MIR169g 
s\t-MIR169h 
s\t-MIR169\ 
s\t-MIR169] 
s\t-MIR169k 
s\X-MIR169r 
s\t-MIR169s 



1,175,425..1, 175,598 
43,441,526..43,441,689 
43,444,486.43,444,666 



35,782,397. 
26,891,154. 
26,895,354. 
26,901,902. 
26,905,493. 
26,905,600. 
19,788,861. 
19,792,133. 

526,081. 
36,921,078. 
36,923,991. 
36,924,215. 
33,994,480. 
33,997,832. 
34,001,008. 
34,003,536. 
34,003,402. 



,35,782,553 
,26,891,261 
,26,895,475 
,26,902,039 
,26,905,600 
,26,905,493 
,19,788,985 
,19,792,288 

,525,981 

,36,921,205 

,36,924,143 

,36,924,361 

,33,994,680 

,33,997,997 

,34,001,109 

,34,003,402 

,34,003,536 



Cluster 1: bd\-MIR169e to bdi-M//?769g = 2,960bp 



Cluster 1: osa-MIR169\ to osa-/W//?769q = 14,446 bp 



Cluster 2: osa-MIR169] to osa-/W//?759k = 3,272 bp 



Cluster 1: s\t-MIR169f to sit-/W//?759h = 3,137 bp 



Cluster 2: s\t-MIR169\ to sit-M//?7&9s = 8,922 bp 



(continued) 
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Table 1 Continued 



Chromosome 


Gene ID^ 


Coordinates'' 


Strand Distance between Genes Flanking the Cluster*^ 


Sorghum bicolor 








chrl 


sb\-MIR169o 


1,029,916..1,029,814 


Cluster 1 : sb\-MIR169o to sb\-MIR169u = 7,321 bp 




sb\-MIR169t 


1,030,265..1,030,155 


- 




sb\-MIR169u 


1,037,237..1,037,096 


- 


chr2 


sb\-MIR169f 


64,603,670..64,603,817 


+ Cluster 2: sb\-MIR169f to sbi-M//?769v = 3,049 bp 




sb'\-MIR169g 


64,606,503..64,606,654 


+ 




sb\-MIR169y 


64,606,719.-64,606,868 


+ 


chr7 


sb\-MIR169r 


61, 058,625..61, 058,750 


+ Cluster 3: sb\-MIR169r to sb\-MIR169n = 12,648 bp 




sb\-MIR169s 


61, 058,750..61, 058,625 


- 




sb\-MIR169\ 


61, 062,736..61, 062,640 


- 




sb\-MIR169m 


61,068,1 18..61,068,027 


- 




sb\-MIR169n 


61,071, 181..61,071,273 


+ 


Zea mays 








chrl 


zma-MIR169\ 


298,277,01 9..298,277,1 07 


+ 


chr2 


zma-MIR169] 


192,700,339..1 92,700,489 


+ Cluster 1: zma-MIR169] to zma-/W//?/59s = 277bp 




zrr\a-MIR169s 


1 92,700,61 6..1 92,700,748 


+ 


chr4 


zma-M//?759i 


47,241,963.47,242,153 


+ Cluster 2: zma-MIR169\ to zma-MIR169e = 271,605 bp 




zma-MIR169d 


47,454,1 77..47,454,304 


- 




zir\a-MIR169h 


47,51 3,567..47,51 3,694 


+ 




zma-MIR169e 


47,51 3,695..47,51 3,568 




chr7 


zma-MIR169k 


1 35,706,1 79..1 35,706,311 


- 


Wt/s vinifera 








chrl 


\iM\-MIR169)j 


22,233,573..22,233,820 


+ 


chr14 


yy\-MIR169z 


25,082,61 2..25,082,498 


Cluster 1 : \n'\-MIR169z to \N'\-MlR169e = 367 bp 




\n\-MIR169q 


25,082,865..25,082,717 


- 


chrl 7 


\n\-MIR169x 


355,71 3..355,837 




Glycine max 








chr6 


gma-MIR169w 


13,783,352..13,783,225 




chr8 


gma-MIR169x 


717,092..717226 


+ Cluster 1: gma-MIR169o to gma-M//?759p = 7,248 bp 




gma-MIR169y 


724,205..724,340 


+ 


Manihot esculenta 








scaffold01701 


rr\es-MIR169w 


436,633..436,794 


+ 


scaffold09876 


mes-MIR169y 


536,51 0..536,709 





^In green are microRNA genes identified in this study. 

"^Chromosomal positions are based on Phytozome annotation for all the species except rice that is based on RAPDB annotation. 

*^Distance within the cluster is calculated from the beginning of the first miRNA gene to the beginning of the last miRNA gene in the cluster. 



identified MIR169 cluster on sorghunn chr1 was collinear with 
an orthologous segment of rice chr3 (fig. 3), although no 
MIR169 gene had previously been found in this region. By 
performing reciprocal Blastn analysis with sb'\-MIR169o 
against the rice genome, we could identify the corresponding 
orthologous MIR169 copy on rice chr3 that we named 
osa-MIR169r- (fig. 3 and supplementary fig. S1, Supplemen- 
tary Material online). Furthermore, osa-MIR169r is contained 
within a segment that is collinear with an orthologous region 
of chr1 of a fourth species, Brachypodium, corresponding to 
b6\-MIR169k (fig. 3). Comparison between sorghum and 
maize revealed that the MIR169 cluster on sorghum chr1 is 
collinear with a segment on maize chr1 that contains 
zn)a-MIR169\ (supplementary fig. S3, Supplementary 
Material online). Indeed, sb\-MIR169u and zn)a-MIR169\ are 
also orthologous gene copies. Finally, when the cluster on 



sorghum chr2 containing sb\-MIR169i and sb\-MIR169g was 
analyzed, collinearity with the segment on sorghum chr7 con- 
taining the sb\-MIR169^/s and sb\-MIR169\-n cluster revealed 
the existence of an additional MIR169 copy on sorghum chr2 
that we named sb\-MIR169y (fig. 2; supplementary fig. S1, 
Supplementary Material online; and table 1). Furthermore, 
the sb\-MIR1 69f/g/y cluster is syntenic with a region on 
maize chr7 containing zma-/\////?/69k and its homoeologous 
region on maize chr2 containing zrr[a-MIR169\ and the newly 
identified zma-MIR169s gene copy (supplementary figs. S1 
and S4, Supplementary Material online; table 1). 

In summary, by aligning sorghum chromosomal segments 
containing MIR169 clusters with orthologous regions of 
Brachypodium, rice, and maize, we were able to identify five 
additional MIR169 copies in sorghum and an additional copy 
in rice and maize, respectively. 



406 Genome Biol. Evol. 5(2):402^17. doi:10.1093/gbe/evt01 5 Advance Access publication January 24, 2013 



Comparative Genomics of MicroRNA169 



GBE 




Fig. 2. — Syntenic alignment of rice and sorghum chromosomal segments containing MIR169 gene clusters. Sorghum MIR169 gene clusters on chr2 and 
chr7 together with their flanking protein coding genes were aligned with rice by orthologous gene pairs. Rice and sorghum chromosomes are represented as 
horizontal lines, whereas genes along the chromosome are represented as rectangle bars. Known MIR169 gene copies are shown as red bars, whereas new 
MIR169 gene copies described in this study are shown as green bars. The bHLH and B-box zinc finger and CCT motif (B-box/CCT) genes are represented as 
yellow bars. All other protein coding genes in the chromosomal regions under study are represented as black bars. Orthologous gene pairs are indicated as 
lines connecting bars, with red lines indicating orthology between MIR169 gene pairs and yellow lines indicating orthology between bHLH and B-box/CCT 
gene pairs, respectively. All other orthology between rice and sorghum protein coding genes are indicated as black lines connecting black bars. The physical 
distance between bHLH and B-box/CCT genes and/or between bHLH or B-Box/CCT genes to the flanking MIR169 copy is indicated. To provide a scale of the 
chromosomal segments highlighted in the figure, the physical distance between the first and the last gene in the segment is indicated and thus serves as a 
reference to observe expansion and contraction of genomic regions. An inversion event on sorghum chr7 containing the MIR169 cluster occurred relative to 
the orthologous regions on sorghum chr2 and rice chr8 and chr9 respectively. 



New MIR169 Clusters in the Recently Sequenced Foxtail 
Millet Genome 

The recent release of the complete reference genome 
sequence for foxtail millet {Setaria italica) (Bennetzen et al. 
201 2; Zhang et al. 201 2) greatly enhances comparative geno- 
mics analysis within the Poaceae, with genome sequences 
available from five species. Foxtail millet provided us with 
additional information to study syntenic relationships with sor- 
ghum because they split from each other approximately 26 
Ma (Zhang et al. 2012). Indeed, 19 collinear blocks were 
found between foxtail millet and sorghum, which comprised 
approximately 72% of the foxtail millet genome (Zhang et al. 
2012). Consequently, we could use sorghum to identify and 



predict MIR169 gene copies in the foxtail millet genome. We 
identified and predicted MIR169 copies in foxtail millet, collin- 
ear with sorghum MIR169 copies, arranged in clusters on 
chrl, chr2, and chr7. The sorghum MIR169 cluster on chr1 
was collinear with a segment on chr9 of foxtail millet, from 
which s\t-MIR169o was identified as the ortholog of 
sb\-MIR169o (fig. 3; supplementary fig. SI, Supplementary 
Material online; and table 1). The sorghum MIR169 copies 
arranged in cluster on chr? were collinear with a segment 
on chr6 from foxtail millet that harbored the newly identified 
orthologous MIR169 copies s\t-MIR169\, s\\-MlR169\, sit- 
MIR169k, s\\-MIR169r, and s\t-MIR169s (fig. 4; supplementary 
fig. SI, Supplementary Material online; and table 1). Finally, 
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Fig. 3. — Sequence alignment of sorghum MIR169 cluster on chrl with orthologous regions from Brachypodium, rice and foxtail millet. The sb\-MIR169o 
copy in sorghum allowed the identification of the orthologous osa-MIR169r copy in rice and s\t-MIR169o copy in foxtail millet, respectively. For the region 
containing sb\- Ml R169o/t/u on chrl , we could not find sufficient conservation of synteny to identify an orthologous region in sorghum, thus a synteny graph 
is only shown with sorghum chrl . An inversion event on rice chr3 occurred relative to Brachypodium, foxtail millet, and sorghum. 
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Fig. 4. — Sequence alignment of sorghum l\/IIR169 cluster on chr7 with orthologous regions from Brachypodium, rice, and foxtail millet. Rice and 
sorghum l\/IIR169 gene copies were used to identify and annotate five MIR169 genes in foxtail millet (shown in green). The bHLH and B-box/CCT genes were 
physically adjacent to A////? 7 69 gene copies in the four species examined. The region examined on sorghum chr7 expanded relative to the orthologous region 
from the other three grasses and was inverted only in sorghum. 
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Fig. 5. — Sequence alignment of sorghum IVIIRIGD cluster on chr2 with orthologous regions from Brachypodium, rice, and foxtail millet. IVIIRIGD gene 
copies were deleted from Brachypodium chr4 but the flanking genes remained. The l\/IIR169 gene cluster in rice was composed of two copies, whereas in 
sorghum and foxtail millet, the cluster comprised three copies. The bHLH gene was present in all four grasses and was physically adjacent to l\/IIR169 gene 
copies in rice, sorghum, and foxtail millet. Sorghum IVIIRIGD gene copies were used to identify and annotate the orthologous copies on foxtail millet scaffold 
2 (shown in green). 



tandenn sorghum MIR169 copies on chr2 were collinear with a 
segment on foxtail millet chr2 that contained the three newly 
predicted MIR169 copies s\t-MIR169i, s\t-MIR169g, and 
s\t-MIR169h (fig. 5; supplementary fig. S1, Supplementary 
Material online; and table 1). 

In summary, we used sorghum as a reference genome to 
identify and predict nine MIR169 gene copies that were col- 
linear with foxtail millet. The prediction of MIR169 genes in 
the foxtail millet will greatly facilitate their experimental vali- 
dation through the sequencing of small RNAs from different 
tissues and developmental stages. 

Gain and Losses of MIR169 Gene Copies during 
Grass Evolution 

To determine expansion and contraction of the MIR169 gene 
clusters, we aligned collinear chromosomal segments of 
diploid Brachypodium, rice, and foxtail millet and the two 
homoeologous regions of allotetraploid maize. Based on 
nucleotide substitution rates, the cluster of MIR169 copies 
on sorghum chr7 was likely preserved from an ancestral 
grass chromosome and comprised five MIR169 gene copies, 
from which three of them were deleted in Brachypodium after 
the split of Brachypodium from the ancestor of rice, foxtail 
millet, and sorghum (figs. 4 and 6A and B). The number of 
l\/IIR169 genes (five copies per cluster) was unchanged in rice. 



sorghum, and foxtail millet, whereas in maize, four copies 
were retained on orthologous homoeologous region on 
chr4 but none on the homoeologous region on chr1 (supple- 
mentary fig. S2, Supplementary Material online, and fig. 6A). 
Although the l\/IIR169 copies were deleted from maize chr1, 
the flanking genes remained intact. 

In the case of the l\/IIR169 cluster on sorghum chr2, its 
evolution can be explained according to two models 
(fig. 6A). In the first one, the ancestor of the grasses had 
two l\/IIR169 copies and they were conserved before the 
split of Brachypodium and rice, with Brachypodium losing 
these two l\/IIR169 copies, where as rice maintained them. 
An additional copy was gained in the common ancestor of 
foxtail millet, sorghum, and maize, giving rise to a cluster with 
three l\/IIR169 gene copies. Phylogenetic analysis suggested 
that the new copy in the ancestor of foxtail millet, sorghum, 
and maize was the ancestral copy that gave rise to 
s\t-l\/IIR169h, sb\-l\/IIR169y, and zma-A////? / 69s, respectively 
(fig. 60- We estimated that the time at which this copy 
arose in the progenitor of foxtail millet, sorghum, and maize 
was approximately 41.1 Ma (see Materials and Methods for 
estimation of time of duplication). Alternatively, the common 
ancestor of the grasses could have had three l\/IIR169 gene 
copies, and one copy was lost in the common ancestor of 
Brachypodium and rice, with a subsequent loss of two 
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Fig. 6. — Gains and losses of MIR169 gene copies during grass evolution. (A) Phylogenetic distribution of MIR169 gene copies in ancestral and current 
species with gain and losses of MIR169 copy number during grass evolution. Numbers in squares represent the number of MIR169 gene copies for a given 
cluster in each species. Numbers along each line represent gains (+) and losses (-) of MIR169 gene copies. The estimated divergence time for each species is 
given at each node in the tree according to Paterson et al. (2009), Brachypodium-Sequencing-lnitiative (2010), Bennetzen et al. (2012) and Zhang et al. 
(2012). The gain in MIR169 copy number of sorghum relative to Brachypodium is depicted. Note: WGD in maize is used as a term to represent the 
allotetraploidy event that took place. NJ phylogenetic trees with bootstrap support are shown depicting the relationships of MIR169 stem-loop sequences 
from the grass species shown in (A). (B) NJ phylogenetic tree with Brachypodium (bdi) and rice (osa) l\/IIR169 stem-loop sequences orthologous to sorghum 
l\/IIR169 copies on chromosome 7. (0 NJ phylogenetic tree with rice (osa) and foxtail millet (sit) l\/IIR169 stem-loop sequences (top) and rice, foxtail millet, 
sorghum (sbi), and maize (zma) A////?/ 69 stem-loop sequences (bottom) orthologous to A////?/ 69 copies on sorghum chromosome 2. (D) NJ phylogenetic tree 
depicting the relationship of foxtail millet and maize l\/IIR169 copies orthologous to sorghum l\/IIR169 copies on chromosome 1 (top), and Brachypodium, 
rice, foxtail millet, and maize A////?/ 69 copies orthologous to sorghum A////?/ 69 copies on chromosome 1 (bottom). 



a(d(ditional MIR169 gene copies in Brachypodium relative to 
rice (fig. 6A). 

Regarding the cluster of MIR169 copies on sorghum chrl, 
we favor a model where the ancestor of the grasses had a 



single MIR169 copy because Brachypodium, rice, and foxtail 
millet all have a single IVIIRIGD copy (fig. 6D). Thus, the addi- 
tional two IVIIRIGD copies present in the sorghum cluster 
could have arisen by duplication events. Phylogenetic analysis 
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suggested that the ancestral copy in the cluster was 
sb\-MIR169o, from which sb\- M I R169t subsequently dupli- 
cated 8.5 Ma (see Materials and Methods) (fig. 6D). Thus, 
sb\-MIR169t was acquired specifically in the sorghum lineage. 
Because sb\-MIR169u and zm3-MIR169\ are highly related but 
distantly related from sb\-MIR169o and sb\-MIR169t (fig. 6D), 
we postulate that the ancestral copy of sb\-MIR169u and 
zvna-MIR169\ was inserted next to the other MIR169 gene 
copies in the progenitor of sorghum and maize. In the 
maize lineage, diploidization after allotetraploidization led to 
the deletion of the corresponding orthologous MIR169 copy 
from the homoeologous segment on chrS, whereas the flank- 
ing genes remained conserved (supplementary fig. S3, 
Supplementary Material online). 

In summary, differences in MIR169 copy number between 
clusters from Brachypodium, rice, foxtail millet, sorghum, and 
maize arose by duplication of ancestral MIR169 genes that 
were retained or lost during grass evolution. Overall, sorghum 
gained eight MIR169 copies relative to Brachypodium, three 
copies relative to rice, two copies relative to foxtail millet, and 
three copies relative to maize. 

Polymorphisms in Chromosomal Inversions Containing 
M//?/ 69 Clusters 

Through the analysis of three chromosomal regions in sor- 
ghum containing MIR169 clusters and their alignment with 
the genomes of Brachypodium, rice, foxtail millet, and maize, 
we were able to identify four chromosomal inversions in total, 
one in rice chr3 containing osa-/\////?/69r (fig. 3); a second 
on sorghum chr7 containing sb\-MiR169^, sbi-IVIIRI 69s, 
sb\-IVIIR169\, sb\-MIR169m, and sb\-MIR169n (fig. 2); a 
third on maize chr1 containing zma-l\/IIR169\ (supplementary 
fig. S3, Supplementary Material online); and the fourth on 
maize chr7 containing zm3-l\/IIR169k (supplementary fig. S4, 
Supplementary Material online), respectively. The inversion on 
rice chr3 was absent from the corresponding collinear regions 
on Brachypodium chr1, sorghum chr1, and foxtail millet chr9 
(fig. 3), indicating that the inversion happened after the split of 
rice from the common ancestor of sorghum and foxtail millet. 
The region on sorghum chr1 containing sb\-MlR169o, 
sb\-MIR169t, and sb\-MIR169u that was collinear with the 
inverted segment on rice chr3 was also collinear with an 
inverted segment on the homoeologous region of maize 
chr1 containing zma-l\/IIR169\ (supplementary fig. S3, 
Supplementary Material online). However, the inversion did 
not occur on the homoeologous region on maize chrS, indi- 
cating that the inversion occurred after the allotetraploidiza- 
tion event that took place in maize. The inversion on sorghum 
chr7 containing sb\-MIR169^, sb\-MIR169s, sb\-MIR169\, 
sbi-/\////?/69m, and sb\-l\/IIR169n cluster only occurred in this 
species (supplementary fig. S2, Supplementary Material 
online, and fig. 4), suggesting that it took place after the 
split of sorghum from the common ancestor of sorghum 



and maize. The l\/IIR169 cluster on sorghum chr2 was collinear 
with an inverted region on maize chr? containing zma- 
MiR169k (supplementary fig. S4, Supplementary Material 
online). The homoeologous region on chr2 did not exhibit 
the inversion, suggesting that it took place after the allotetra- 
ploidization event that occurred in maize. 

In summary, four inversions containing MIR169 copies 
were found in total, one in rice, one in sorghum, and two in 
maize. These inversions were lineage specific as none of them 
was present in a collinear region in the genome of a second 
grass species, indicating that these inversions happened after 
the species were formed. 

Validation of Newly Identified MIR169 Gene Copies in 
Sorghum and Maize 

To experimentally validate the new IVIIRI 69 gene copies found 
in sorghum through our syntenic analysis among grasses, we 
mapped previously sequenced small RNAs from sorghum 
stems (Calvino et al. 201 1) to the newly predicted IVIIRI 69t/ 
u/v/r/s hairpins. Similarly, to validate the newly described 
zma-MIR169s gene copy in maize, we constructed small 
RNA libraries from endosperm tissue belonging to cultivars 
B73, Mo17, and their reciprocal crosses (supplementary 
table SI, Supplementary Material online). Maize endosperm- 
derived small RNAs were then mapped to the new MIR169s 
hairpin annotated in this study. We could effectively map small 
RNA reads to the stem-loop sequences of all five predicted 
microRNA169 in sorghum (with respect to sb'\-MIR169r/s, see 
next section). In the case of sb\-MIR169t and sb\-MIR169u, the 
most abundant small RNA reads were derived from the 
miR169* sequence (supplementary fig. S5, Supplementary 
Material online), although small RNAs derived from the canon- 
ical miR169 sequence were also found but in less abundance. 
The experimental validation of sb\-MIR169y was supported 
with mapping of small RNAs to the corresponding 
predicted mature miR169v sequence (supplementary fig. S5, 
Supplementary Material online). Regarding the experimental 
validation of the predicted zma-MIR1 69s copy in maize, we 
were able to detect small RNA reads derived from miR169s 
although their abundance was very low (supplementary fig. 
S5, Supplementary Material online). 

Antisense MicroRNA169 Gene Pairs Generate Small 
RNAs that Target Different Set of Genes 

In rice, osa-MIR169\ and osa-MIR169o^ were annotated as 
antisense microRNAs and small RNA reads derived from 
both strands were identified (Xue et al. 2009). In sorghum, 
sbi-M//?y69r, and sb\-MIR169s are collinear with osa-MIR169\/ 
q (figs. 2 and 4) and are antisense microRNAs as well (supple- 
mentary figs. S1 and S6/\, Supplementary Material online). 
Despite the lack of Expressed Sequence Tag (EST) evidence 
for sb\-MIR169\- and sb\-MIR169s annotation, our previously 
generated small RNA library from sorghum stem tissue 
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(Calvino et al. 2011) supported the transcription from both 
strands based on snnall RNA reads mapped to both 
sb\-MIR169r and sb'\-MIR169s, respectively (supplementary 
fig. S6A, Supplementary Material online). Similarly, EST evi- 
dence supported the transcription from opposite strands in 
the microRNA antisense pair zrc]a-MIR169e/h (ESTs 
ZM_BFb0354L14.r and ZM_BFb0294A24.f, respectively). 
Because small RNAs derived from zma-MIR169e/h had not 
been previously reported (miRBase database: release 19, 
August 2012), we used the SOLID system to sequence small 
RNAs from endosperm tissue derived from B73 and Mo17 
cultivars and their reciprocal crosses; however, we could not 
detect small RNA reads derived from them, at least in endo- 
sperm tissue. Thus, antisense microRNAs from MIR169 gene 
copies are being actively produced in rice and sorghum, and 
possibly in maize. 

With respect to the sb\-MIR169r/s antisense gene pair, we 
found that the small RNA reads mapped to sb\-MlR169r were 
predominantly associated with the miR169r* sequence (sup- 
plementary fig. S6/\, Supplementary Material online). The 
mature miRNA sequences for sbi-miR169r* and sbi-miR169s 
differed from each other in seven nucleotides (supplementary 
fig. S6B, Supplementary Material online). Moreover, they 
would have different set of genes as targets based on their 
sequences (supplementary figs. S7 and S8, Supplementary 
Material online). Moreover, the assumption that also 
microRNA* have functional roles was recently described 
(Meng etal. 2011; Yang et al. 2011). 

Linkage of MIR169 Gene Copies with Flowering and 
Plant Height Genes 

Based on the alignment of collinear regions containing 
MIR169 genes located on sorghum chr2 and chr7, we noticed 
a tight linkage of MIR169 copies with two genes encoding a 
bHLH protein, and a B-box zinc finger and CCT-motif protein 
that were similar to Arabidopsis bHLH137 and CONSTANS- 
LIKE 14 proteins (figs. 2, 4, and 5 and supplementary figs. S2 
and S4, Supplementary Material online). The Arabidopsis 
bHLH 137 and C0L14 genes were described to have a role 
in gibberellin signaling (mutations in genes involved in gibber- 
ellin signaling and/or perception affects plant height 
[Fernandez et al. 2009]) and flowering time, respectively 
(Griffiths et al. 2003; Wenkel et al. 2006; Zentella et al. 
2007). The physical linkage of l\/IIR169 gene copies to bHLH 
and COL genes (or any of the two) was present in all the five 
grasses examined. We hypothesized that the physical associ- 
ation of l\/IIR169 to either of these flowering and/or plant 
height genes could be of relevance because of previously 
reported trade-offs in sorghum between sugar content in 
stems and plant height and flowering time, respectively 
(Murray et al. 2008). For breeding purposes, the introgression 
of a particular gene/phenotype from a specific cultivar into 
another would consequently also bring in the neighboring 



gene, a process known as linkage drag. Furthermore, linkage 
drag between l\/IIR169 copies and the bHLH and COL genes 
could also be of ecological importance because a single chro- 
mosomal segment comprises genes involved in drought toler- 
ance, sugar accumulation, and flowering. If this is the case, 
linkage of l\/IIR169 copies to either bHLH or COL genes could 
have been preserved even after the monocotyledonous diver- 
sification. Indeed, we were able to find collinearity between 
chromosomal segments containing l\/IIR169 and bHLH genes 
from Bradiypodium, sorghum, soybean, and cassava (fig. 7). 
Moreover, we found that the physical linkage between 
l\/IIR169 and the bHLH gene on sorghum chr7 was retained 
in collinear regions of soybean chr6 and cassava scaffold 
01701, respectively (fig. 7). Similarly, the physical/genetic as- 
sociation of l\/IIR169 with the bHLH gene from sorghum chr2 
was retained in the corresponding collinear regions from soy- 
bean chrB and cassava scaffold 09876 (fig. 8). Interestingly, 
the linkage between l\/IIR169 and the COL gene that was 
present in Bracliypodium chr3 and sorghum chr7 was 
broken in the corresponding collinear regions of soybean 
chr6 and cassava scaffold 01701 (fig. 7). We then compared 
the two l\/IIR169 clusters from sorghum chr2 and chr7 with 
the grapevine genome because grapevine and sorghum are 
more closely related than sorghum to soybean and cassava, 
respectively. Our comparison revealed a two-to-three relation- 
ship between sorghum and grapevine (fig. 9), and this is con- 
sistent with the paleo-hexaploidy event that took place in the 
grapevine genome (Jaillon et al. 2007). The physical/genetic 
linkage of A////?/ 69 copies with the COL gene on sorghum chr7 
was preserved in two of the three homoeologous chromo- 
somal segments in grapevine on chrl and chr14, whereas the 
third homoeologous segment on chrl 7 retained the close as- 
sociation of l\/IIR169 with the bHLH gene. 

The finding of microsynteny conservation between mono- 
cots and dicots species in chromosomal segments containing 
l\/IIR169 gene copies together with bHLH and COL genes is 
remarkable because the estimated time of divergence be- 
tween monocots and dicots is approximately 130-240 Ma 
(Wolfe et al. 1989; Jaillon et al. 2007). Such microsynteny 
conservation permitted the discovery of new l\/IIR169 gene 
copies in soybean (gma-/\////?/69w, gm3-l\/IIR169x and gma- 
MIR169y), cassava (mes-/\////?/69w and mes-MIR169y), and 
grapevine {yy\-MIR169z). 

Subfunctionalization of the bHLH Gene in the l\/IIR169 
Cluster of Brachypodium 

The microsynteny in chromosomal segments containing 
miR169 gene copies flanked by the bHLH gene among such 
distantly related species such as Bracliypodium and cassava 
suggests that the linkage between miR169 and bHLH resulted 
from selection because of the divergence from a common 
ancestor approximately 130-240 Ma. In support of this inter- 
pretation, the bHLH gene on Bracliypodium chr4, where the 
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Fig. 7. — Sequence alignment of sorghum MIR169 cluster on chr7 with orthologous regions from Brachypodium, soybean, and cassava. There is 
conservation of synteny between monocot species Brachypodium and sorghum and dicot species soybean and cassava when chromosomal segments 
containing MIR169 gene copies and their flanking genes are aligned. Conservation of synteny allowed the identification of new IVIIRIGD gene copies on 
soybean chromosome 6 (gma-/\////?769w) and cassava scaffold 01701 (mes-/\////?/69w), respectively. Physical association on the chromosome between 
IVIIRIGD and the flanking bHLH gene was retained in soybean and cassava as well. Notice the inversion on soybean chr6. 



miR169 duster had been deleted, appeared to have under- 
gone subfunctionalization. First, the bHLH copy on 
Brachypodium chr4 involved the loss of the basic donnain, 
which is involved in DNA binding (Toledo-Ortiz 2003) and 
thus evolved into a HLH protein (supplementary fig. S9A 
and B, Supplementary Material online). Because bHLH pro- 
teins act as homo- and/or heterodimers, where the basic 
domain of each bHLH protein binds DNA, HLH proteins 
homo- or heterodimerize and prevent the binding of the com- 
plex to DNA and thus becomes a negative regulator (Toledo- 
Ortiz 2003). Second, Brachypodium has a redundant intact 
orthologous copy on chr3, also an miR169 cluster next to it 
(supplementary fig. S9, Supplementary Material online). Third, 
the synonymous and nonsynonymous substitution rate of the 
HLH orthologous gene pairs was higher than the synonymous 
and nonsynonymous substitution rate in the bHLH ortholo- 
gous gene pairs, respectively (supplementary fig. S9C, Supple- 
mentary Material online). Fourth, when we run a test for 
detecting adaptive evolution (calculated as the number of re- 
placement mutations per replacement sites [dN] divided by the 
number of silent mutations per silent site [d5]) in the bHLH and 
HLH coding sequences, we found evidence on purifying selec- 
tion on the HLH gene sequence {6N/6S ratio of -4.647). 



Conservation of synteny between sorghum and grapevine 
showed that the linkage between IVIIRIGD gene copies and 
the COL gene was maintained in both species. Both COL 
genes in grapevine, on chr14 and on chrl, lost the B-box 
and zinc finger domain, whereas the orthologous copy in 
sorghum retained it (supplementary fig. S10/\ and B, 
Supplementary Material online). Similarly, foxtail millet COL 
protein lost the B-box and zinc finger domain, whereas 
Brachypodium, rice, and maize retained it. The B-box and 
zinc finger domain are thought to mediate protein-protein 
interactions, whereas the CCT domain acts as a nuclear local- 
ization signal, with mutations in both domains causing flower- 
ing time phenotypes (Griffiths et al. 2003; Wenkel et al. 2006; 
Valverde 201 1). Although the COL gene on grapevine chr14 
has been recently identified as a candidate gene for a flower- 
ing Quantitative Trait Loci (QTL) (Duchene et al. 2012), the 
function of its corresponding orthologous copy on sorghum 
chr7 remains to be elucidated. 

Discussion 

We describe the alignment of 25 chromosomal regions with 
orthologous gene pairs from eight different plant species. 
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Fig. 8. — Sequence alignment of sorghum MIR169 cluster on chr2 with orthologous regions from Brachypodium, soybean, and cassava. The alignment 
of sorghum MIR169 cluster on chr2 with soybean chr8 and cassava scaffold 09876 allowed the identification of two new MIR169 gene copies in soybean 
{gma-MIR169x and gma-MIR169y) and one new copy in cassava {mes-MIR169y), respectively. The physical association of MIR169 gene copies with the bHLH 
was retained in soybean and cassava. An inversion occurred on soybean chrS. 



These regions contain a total of 48 MIR169 gene copies, from 
which 22 of them have been described and annotated here 
for the first time. The alignment of sorghum chromosomal 
regions containing MIR169 clusters to their corresponding 
orthologous regions from Brachypodium, rice, foxtail millet, 
and maize, respectively, allows us not only to better under- 
stand the differential amplification of IVIIRIGD gene copies 
during speciation but also to identify new IVIIRIGD gene 
copies not previously annotated in the rice, sorghum, and 
maize genomes. Our work highlights the usefulness of this 
approach in the discovery of microRNA gene copies in grass 
genomes and surprisingly also in dicotyledonous genomes 
such as those from grapevine, soybean, and cassava. In addi- 
tion, collinearity among grasses was used to predict and an- 
notate IVIIRIGD hairpin structures in the foxtail millet genome 
de novo, from which no current microRNA annotation was 
available from the miRBase database (Release 19: August 
2012). Our work suggests that synteny-based analysis 
should complement (whenever possible) homology-based 
searches of new microRNA gene copies in plant genomes. 

Our analysis of IVIIRIGD gene copies organized in clusters in 
the sorghum genome revealed that sorghum acquired eight 
IVIIRIGD gene copies after Brachypodium split from a common 
ancestor, primarily due to gene losses (up to 5 IVIIRIGD gene 
copies) in the Brachypodium lineage and new gene copies 
(up to 3) in the sorghum lineage (fig. 6A). We propose that 



differences in IVIIRIGD gene copy number between sorghum 
and Brachypodium is based on selective amplification in sor- 
ghum. Because diploidization of the maize genome resulted in 
the deletion of duplicated gene copies after allotetraploidiza- 
tion approximately 4.7 Ma (Messing et al. 2004; Swigonova 
et al. 2004), also resulted in selective amplification in sorghum. 
Maize lost more than half, 9 of ^6 IVIIRIGD gene copies, after 
allotetraploidization. Single gene losses in maize appear to be 
caused by short deletions that are predominantly in the 
5-1 78 bp size range, with these deletions being approximately 
2.3 times more frequent in one homoeologous chromosome 
than in the other (Woodhouse et al. 2010). This observation is 
particularly relevant to maize microRNAs genes with average 
length distributions at the 5^-regions of their primary 
microRNAs (pri-miRNAs) in the order of 1 00-300 nt (Zhang 
et al. 2009). Although we detected chromosome breaks of 
the IVIIRIGQ neighboring gene C0L14 on the maize homoeo- 
logous chr1-chr4 pair (supplementary fig. S2, Supplementary 
Material online) and the bHLH gene on maize homeologous 
chr2-chr7 pair (supplementary fig. S4, Supplementary 
Material online), retention of the bHLH gene copy on both 
homoeologous regions from chrl and chr4 was observed 
(supplementary fig. S2, Supplementary Material online). It 
has been observed that transcription factors are preferentially 
retained after whole-genome duplication (WGD) (Xu and 
Messing 2008; Murat et al. 2010), with a recent study 
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Fig. 9. — Conservation of synteny between sorghum and grapevine chromosomal segments containing MIR169 gene copies. Sorghum segments 
containing MIR169 gene clusters from chr2 and chr7 were aligned to the grapevine genome based on orthologous gene pairs. Because grapevine is a 
hexopaleo-polyploid, we found a 2:3 chromosomal relationship between sorghum and grapevine. Collinearity allowed the identification of a new MIR169 
copy {W\-MIR169z) in grapevine chr 14. Different grapevine chromosomes are represented in colors, whereas sorghum chromosomes are in black. Relative to 
sorghum chr2, grapevine had an inversion event on chr14 and chr17. The association of MIR169 with its flanking COL gene was maintained on grapevine 
chr 14 and chrl, whereas the association of MIR169 with the bHLH gene was maintained on chrl . 



showing that from 2,943 sorghum-maize syntenic shared 
genes, 43% of them were retained as homoeologous pairs 
in maize, from which transcription factors were 4.3 times 
more frequently among retained genes than other functions 
(Woodhouse etal. 2010). 

Alignment of sorghum regions containing MIR169 gene 
copies on chr2 and chr7 with their respective collinear regions 
from Brachypodium, rice, foxtail millet, and maize revealed the 
close linkage of MIR169 gene copies with their flanking 
C0L14 and bHLH genes in all five grasses examined. 
Furthermore, collinearity of MIR169 gene copies with either 
the C0L14 and/or the bHLH genes extended to dicot species 
such as grapevine, soybean, and cassava. Previously, it was 
suggested that conservation of collinearity between monocot 
and dicot species is rather rare because of the dynamic geno- 
mic rearrangements in genomes over 130-240 Ma (Wolfe 
et al. 1989; Jaillon et al. 2007). Still, conservation of synteny 
between rice and grapevine was also previously observed 
(Tang et al. 2010). Therefore, we hypothesized that preserva- 
tion of collinearity in rare cases was subject to selection even 
after WGD events. In support of this hypothesis, the 



pseudofunctionalization and higher protein divergence rate 
of the HLH gene in Brachypodium chr4, where the MIR169 
cluster was deleted, occurred in comparison to the ortholo- 
gous bHLH copy on chr3 with the MIR169e and MIR169g 
copies next to it. Indeed, trade-offs between sugar content 
and flowering time/plant height were reported in sorghum 
(Murray et al. 2008). When two genes controlling linked phe- 
notypes are in close proximity on the chromosome for selec- 
tion to act on both of them, the loss of one gene releases 
selection pressure on the other gene, allowing it to diverge. 
On the basis of its similarity to Arabidopsis bHLH137, which 
was postulated as putative DELLA target gene that functions 
in the GA response pathway (Zentella et al. 2007), we hypoth- 
esize that the grass homolog may function either in flowering 
and/or plant height, which future research will have to con- 
firm. On the other hand, the importance of COL family pro- 
teins in the regulation of flowering time is well known 
(Griffiths et al. 2003; Wenkel et al. 2006). Collinearity be- 
tween sorghum and grapevine revealed the tight association 
of C0L14 with \N\-MIR169z and wi-MIRI 69e on grapevine 
chrl 4, with the three genes contained within a 2.3 Kb interval. 
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Furthernnore, C0L14 has been recently considered a candidate 
gene for a flowering QTL in grapevine (Duchene et al. 2012). 
With such a short physical distance between a flowering time 
gene and two MIR169 gene copies, it is tempting to propose 
that grapevine breeding for late or early flowering time could 
have brought different C0L14 alleles together with its neigh- 
boring MIR169 genes, a process known as linkage drag. 
Interestingly, although we could not find extensive collinearity 
between sorghum and Arabidopsis thaliana as to draw a syn- 
teny graph, we did find a close association on chrS between 
C0L4 gene and 3th-MIR169b, separated each other 61 .7 kb 
(data not shown). 

On the basis of these considerations, we can propose a 
hypothesis where the linkage of MIR169 gene copies with 
the neighboring COL gene could have coevolved (supplemen- 
tary fig. S1 1, Supplementary Material online). This hypothesis 
is based on the findings presented here, together with a pre- 
vious report describing that CO and COL proteins can interact 
through their CCT domains with proteins belonging to the 
NF-Y (HAP) family of transcription factors (Wenkel et al. 
2006); specifically, it was described that CO together with 
C0L15 interacted with NF-YB and NF-YC displacing NF-YA 
from the ternary complex. The mRNAs encoded by the 
NF-YA gene family are known targets of miR169 (Li et al. 
2008). Thus, the association on the chromosome of a COL 
gene with a MIR169 gene or gene cluster would ensure that 
miRI 69 would reduce the expression of the NF-YA mRNA and 
thus its protein levels, so that the COL protein can replace 
NF-YA in the ternary complex and drive transcription of CC 
AAT box genes. Furthermore, this hypothesis could provide a 
genetic framework where to test the previously known 
drought and flowering trade-offs: When plants are exposed 
to drought stress during the growing season, they flower ear- 
lier than control plants under well-watered environments 
(Franks et al. 2007), with the response being genetically in- 
herited. For this reason, we decided to term our model the 
"Drought and Flowering Genetic Module Hypothesis." 

We can envision a prominent role of linkage drag in breed- 
ing sorghum for enhanced biofuel traits such as high sugar 
content in stems and late flowering time for increased bio- 
mass. Under the MIR169-bHlH and/or MIR169-COI linkage 
drag model, any breeding scheme in sweet sorghum whose 
aim is to increase plant biomass through delayed flowering by 
crossing cultivars with different COL and/or bHLH alleles on 
either chr7 or chr2, respectively, should take into account the 
allelic variation at the neighboring MIR169 gene copies as they 
may affect sugar content in stems and drought tolerance. The 
same can be said in breeding sorghum for grain production 
where the norm is to increase germplasm diversity among 
grain sorghums through the introduction of dwarf and early 
flowering genes from a donor line into exotic tall and late 
flowering lines with African origins (Brown et al. 2008). 

On the basis of our results from comparative genomics 
analysis, we envision that any conservation in collinearity 



between closely associated genes (in this particular study be- 
tween a microRNA and a protein-coding gene) controlling 
related phenotypes that is conserved among several plant spe- 
cies might be subject to linkage drag through breeding, open- 
ing a new area of research in genomics assisted breeding. In 
support of this notion, the early development of conserved 
ortholog set markers (referred as COS markers) among differ- 
ent plant species (Fulton et al. 2002) highlighted the existence 
of a set of genes with synteny conservation because of the 
early radiation of dicotyledonous plants that can be used in 
mapping through comparative genomics. In addition, conser- 
vation in linkage between candidate genes for seed glucosi- 
nolate content and SSR markers between Arabidopsis and 
oilseed rape {Brassica napus ssp. napus) were used in 
marker-assisted selection in breeding oilseed rape for total 
glucosinolate content (Hasan et al. 2008). 

Supplementary Material 

Supplementary figures SI-S1 1 and table SI are available at 
Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org). 
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